{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multi-Layer Perceptron, MNIST\n",
    "---\n",
    "In this notebook, we will train an MLP to classify images from the [MNIST database](http://yann.lecun.com/exdb/mnist/) hand-written digit database.\n",
    "\n",
    "The process will be broken down into the following steps:\n",
    ">1. Load and visualize the data\n",
    "2. Define a neural network\n",
    "3. Train the model\n",
    "4. Evaluate the performance of our trained model on a test dataset!\n",
    "\n",
    "Before we begin, we have to import the necessary libraries for working with data and PyTorch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# import libraries\n",
    "import torch\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Load and Visualize the [Data](http://pytorch.org/docs/stable/torchvision/datasets.html)\n",
    "\n",
    "Downloading may take a few moments, and you should see your progress as the data is loading. You may also choose to change the `batch_size` if you want to load more data at a time.\n",
    "\n",
    "This cell will create DataLoaders for each of our datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The MNIST datasets are hosted on yann.lecun.com that has moved under CloudFlare protection\n",
    "# Run this script to enable the datasets download\n",
    "# Reference: https://github.com/pytorch/vision/issues/1938\n",
    "\n",
    "from six.moves import urllib\n",
    "opener = urllib.request.build_opener()\n",
    "opener.addheaders = [('User-agent', 'Mozilla/5.0')]\n",
    "urllib.request.install_opener(opener)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torchvision import datasets\n",
    "import torchvision.transforms as transforms\n",
    "\n",
    "# number of subprocesses to use for data loading\n",
    "num_workers = 0\n",
    "# how many samples per batch to load\n",
    "batch_size = 20\n",
    "\n",
    "# convert data to torch.FloatTensor\n",
    "transform = transforms.ToTensor()\n",
    "\n",
    "# choose the training and test datasets\n",
    "train_data = datasets.MNIST(root='data', train=True,\n",
    "                                   download=True, transform=transform)\n",
    "test_data = datasets.MNIST(root='data', train=False,\n",
    "                                  download=True, transform=transform)\n",
    "\n",
    "# prepare data loaders\n",
    "train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,\n",
    "    num_workers=num_workers)\n",
    "test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, \n",
    "    num_workers=num_workers)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualize a Batch of Training Data\n",
    "\n",
    "The first step in a classification task is to take a look at the data, make sure it is loaded in correctly, then make any initial observations about patterns in that data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "    \n",
    "# obtain one batch of training images\n",
    "dataiter = iter(train_loader)\n",
    "images, labels = dataiter.next()\n",
    "images = images.numpy()\n",
    "\n",
    "# plot the images in the batch, along with the corresponding labels\n",
    "fig = plt.figure(figsize=(25, 4))\n",
    "for idx in np.arange(20):\n",
    "    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])\n",
    "    ax.imshow(np.squeeze(images[idx]), cmap='gray')\n",
    "    # print out the correct label for each image\n",
    "    # .item() gets the value contained in a Tensor\n",
    "    ax.set_title(str(labels[idx].item()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View an Image in More Detail"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img = np.squeeze(images[1])\n",
    "\n",
    "fig = plt.figure(figsize = (12,12)) \n",
    "ax = fig.add_subplot(111)\n",
    "ax.imshow(img, cmap='gray')\n",
    "width, height = img.shape\n",
    "thresh = img.max()/2.5\n",
    "for x in range(width):\n",
    "    for y in range(height):\n",
    "        val = round(img[x][y],2) if img[x][y] !=0 else 0\n",
    "        ax.annotate(str(val), xy=(y,x),\n",
    "                    horizontalalignment='center',\n",
    "                    verticalalignment='center',\n",
    "                    color='white' if img[x][y]<thresh else 'black')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Define the Network [Architecture](http://pytorch.org/docs/stable/nn.html)\n",
    "\n",
    "The architecture will be responsible for seeing as input a 784-dim Tensor of pixel values for each image, and producing a Tensor of length 10 (our number of classes) that indicates the class scores for an input image. This particular example uses two hidden layers and dropout to avoid overfitting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "\n",
    "## TODO: Define the NN architecture\n",
    "class Net(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        # linear layer (784 -> 1 hidden node)\n",
    "        self.fc1 = nn.Linear(28 * 28, 1)\n",
    "\n",
    "    def forward(self, x):\n",
    "        # flatten image input\n",
    "        x = x.view(-1, 28 * 28)\n",
    "        # add hidden layer, with relu activation function\n",
    "        x = F.relu(self.fc1(x))\n",
    "        return x\n",
    "\n",
    "# initialize the NN\n",
    "model = Net()\n",
    "print(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Specify [Loss Function](http://pytorch.org/docs/stable/nn.html#loss-functions) and [Optimizer](http://pytorch.org/docs/stable/optim.html)\n",
    "\n",
    "It's recommended that you use cross-entropy loss for classification. If you look at the documentation (linked above), you can see that PyTorch's cross entropy function applies a softmax funtion to the output layer *and* then calculates the log loss."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "## TODO: Specify loss and optimization functions\n",
    "\n",
    "# specify loss function\n",
    "criterion = None\n",
    "\n",
    "# specify optimizer\n",
    "optimizer = None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Train the Network\n",
    "\n",
    "The steps for training/learning from a batch of data are described in the comments below:\n",
    "1. Clear the gradients of all optimized variables\n",
    "2. Forward pass: compute predicted outputs by passing inputs to the model\n",
    "3. Calculate the loss\n",
    "4. Backward pass: compute gradient of the loss with respect to model parameters\n",
    "5. Perform a single optimization step (parameter update)\n",
    "6. Update average training loss\n",
    "\n",
    "The following loop trains for 30 epochs; feel free to change this number. For now, we suggest somewhere between 20-50 epochs. As you train, take a look at how the values for the training loss decrease over time. We want it to decrease while also avoiding overfitting the training data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# number of epochs to train the model\n",
    "n_epochs = 30  # suggest training between 20-50 epochs\n",
    "\n",
    "model.train() # prep model for training\n",
    "\n",
    "for epoch in range(n_epochs):\n",
    "    # monitor training loss\n",
    "    train_loss = 0.0\n",
    "    \n",
    "    ###################\n",
    "    # train the model #\n",
    "    ###################\n",
    "    for data, target in train_loader:\n",
    "        # clear the gradients of all optimized variables\n",
    "        optimizer.zero_grad()\n",
    "        # forward pass: compute predicted outputs by passing inputs to the model\n",
    "        output = model(data)\n",
    "        # calculate the loss\n",
    "        loss = criterion(output, target)\n",
    "        # backward pass: compute gradient of the loss with respect to model parameters\n",
    "        loss.backward()\n",
    "        # perform a single optimization step (parameter update)\n",
    "        optimizer.step()\n",
    "        # update running training loss\n",
    "        train_loss += loss.item()*data.size(0)\n",
    "        \n",
    "    # print training statistics \n",
    "    # calculate average loss over an epoch\n",
    "    train_loss = train_loss/len(train_loader.dataset)\n",
    "\n",
    "    print('Epoch: {} \\tTraining Loss: {:.6f}'.format(\n",
    "        epoch+1, \n",
    "        train_loss\n",
    "        ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Test the Trained Network\n",
    "\n",
    "Finally, we test our best model on previously unseen **test data** and evaluate it's performance. Testing on unseen data is a good way to check that our model generalizes well. It may also be useful to be granular in this analysis and take a look at how this model performs on each class as well as looking at its overall loss and accuracy.\n",
    "\n",
    "#### `model.eval()`\n",
    "\n",
    "`model.eval(`) will set all the layers in your model to evaluation mode. This affects layers like dropout layers that turn \"off\" nodes during training with some probability, but should allow every node to be \"on\" for evaluation!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# initialize lists to monitor test loss and accuracy\n",
    "test_loss = 0.0\n",
    "class_correct = list(0. for i in range(10))\n",
    "class_total = list(0. for i in range(10))\n",
    "\n",
    "model.eval() # prep model for *evaluation*\n",
    "\n",
    "for data, target in test_loader:\n",
    "    # forward pass: compute predicted outputs by passing inputs to the model\n",
    "    output = model(data)\n",
    "    # calculate the loss\n",
    "    loss = criterion(output, target)\n",
    "    # update test loss \n",
    "    test_loss += loss.item()*data.size(0)\n",
    "    # convert output probabilities to predicted class\n",
    "    _, pred = torch.max(output, 1)\n",
    "    # compare predictions to true label\n",
    "    correct = np.squeeze(pred.eq(target.data.view_as(pred)))\n",
    "    # calculate test accuracy for each object class\n",
    "    for i in range(batch_size):\n",
    "        label = target.data[i]\n",
    "        class_correct[label] += correct[i].item()\n",
    "        class_total[label] += 1\n",
    "\n",
    "# calculate and print avg test loss\n",
    "test_loss = test_loss/len(test_loader.dataset)\n",
    "print('Test Loss: {:.6f}\\n'.format(test_loss))\n",
    "\n",
    "for i in range(10):\n",
    "    if class_total[i] > 0:\n",
    "        print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (\n",
    "            str(i), 100 * class_correct[i] / class_total[i],\n",
    "            class_correct[i], class_total[i]))\n",
    "    else:\n",
    "        print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))\n",
    "\n",
    "print('\\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (\n",
    "    100. * np.sum(class_correct) / np.sum(class_total),\n",
    "    np.sum(class_correct), np.sum(class_total)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualize Sample Test Results\n",
    "\n",
    "This cell displays test images and their labels in this format: `predicted (ground-truth)`. The text will be green for accurately classified examples and red for incorrect predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# obtain one batch of test images\n",
    "dataiter = iter(test_loader)\n",
    "images, labels = dataiter.next()\n",
    "\n",
    "# get sample outputs\n",
    "output = model(images)\n",
    "# convert output probabilities to predicted class\n",
    "_, preds = torch.max(output, 1)\n",
    "# prep images for display\n",
    "images = images.numpy()\n",
    "\n",
    "# plot the images in the batch, along with predicted and true labels\n",
    "fig = plt.figure(figsize=(25, 4))\n",
    "for idx in np.arange(20):\n",
    "    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])\n",
    "    ax.imshow(np.squeeze(images[idx]), cmap='gray')\n",
    "    ax.set_title(\"{} ({})\".format(str(preds[idx].item()), str(labels[idx].item())),\n",
    "                 color=(\"green\" if preds[idx]==labels[idx] else \"red\"))"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
